Interpretable Machine Learning of Two‐Photon Absorption

Abstract Molecules with strong two‐photon absorption (TPA) are important in many advanced applications such as upconverted laser and photodynamic therapy, but their design is hampered by the high cost of experimental screening and accurate quantum chemical (QC) calculations. Here a systematic study is performed by collecting an experimental TPA database with ≈900 molecules, analyzing with interpretable machine learning (ML) the key molecular features explaining TPA magnitudes, and building a fast ML model for predictions. The ML model has prediction errors of similar magnitude compared to experimental and affordable QC methods errors and has the potential for high‐throughput screening as additionally validated with the new experimental measurements. ML feature analysis is generally consistent with common beliefs which is quantified and rectified. The most important feature is conjugation length followed by features reflecting the effects of donor and acceptor substitution and coplanarity.


Supplementary information
Section 1. The dataset from literature reports Figure S1. TPACS values of some common commercially available dyes from different sources. [1] The standard deviation was noted above the bar. These discrepancies are possibly introduced because the TPACS measurements depend on the femtosecond laser pulse, which may vary from instrument to instrument.

MFF Features
The concept of Molecular Fragment Fingerprint [2] was inspired by ECFP provided by Deepchem [3] , while the hash function of the features was bypassed and the interpretable molecular fragments were generated. An occurrence frequency threshold was set to drop features that appeared too few times. Thus, useful substructures of a molecule were extracted. There are 564 MFF features included in our dataset.

RDKit Features
We use all the available descriptors given by RDKit. [4] There are 107 RDKit features included.

'Num-Conju-Stru'
The number of conjugation structures in one molecule. Each conjugation structure contains more than one atoms in the π-system. represents this feature for a particular molecule, where the subscript is the index of the molecule in the dataset.

'Conju-Num-Atom-All'
The maximum number of atoms of all conjugation structures in one molecule. This descriptor was not included in the first round of machine learning.
The subscript is the index of conjugation structure in the th molecule.

'Conju-Num-Atom-Ratio'
The number of atoms in all conjugation structures divided by the number of atoms in one molecule.
∑ is the number of atoms in the th molecule.

'Conju-Num-Atom-Individual'
The number of atoms in conjugation structures in one molecule. If the molecule contains several independent conjugated sub-structures, the number of atoms of those independent ones should be considered as well.

'Conju-Wt-Ratio'
The sum of weight of all conjugation structures divided by the molecular weight.
∑ is the molecular weight of the th molecule.

'Conju-Wt-Part'
The maximum weight of all conjugation structures in one molecule. The weight of conjugation structure is calculated by adding up the atomic weight of atoms it contains.
∑ stands for the number of atoms in the th conjugation structure of the th molecule. The subscript means the index of atom in the th conjugation structure of the th molecule.

'Conju-Wt-Ave'
The maximum atomically averaged weight of all conjugation structures in one molecule. The atomically averaged weight of conjugation structure is calculated by dividing the weight of conjugation structure by the number of atoms it contains.

'Conju-Max-Distance'
The maximum conjugated length of all conjugation structures in one molecule. First, the distance matrix was obtained by RDKit by "Chem.rdmolops.GetDistanceMatrix(mol)". Then, the conjugated sub-structures were selected and the maximum of its distance matrix was calculated.
{ } is the shortest distance between the th and the th atoms in the th conjugation substructure of the th molecule.

'Conju-Branching-Index '
The maximum branching index of all conjugation structures in one molecule. The branching index of one conjugation structure is defined as the number of atoms it contains dividing by half of the sum of degrees of atoms.
∑ stands for the degree of the k th atom in the j th conjugation structure of the i th molecule. The degree of one atom is equal to the number of its neighbor atoms.

'Conju-Branch-Ratio'
We count the number of possible conjugated paths of the maximum conjugated length. If this number is beyond 1, branching structure must exist in the molecule. We then try to determine if the branching happens at the middle of the conjugated system or at the rim of the system. First, we find two conjugated paths of the maximum conjugated length, AB and AB", sharing a common end atom A. We then find the other end atoms of the two paths, B and B" respectively. We calculated the distance between B and B" L b .
As a result, a "Conju-Branch-Ratio" close to 1 corresponds to a branching in the middle of the molecule.

'Full-Mol-Wiener-Index'
Wiener Index of the whole molecule. Wiener Index is calculated by the formula below: ∑ ∑ is the number of nodes in the graph, and is the shortest distance between the th and the th nodes.
The Wiener Index of a molecule is calculated by the formula below: is the shortest distance between the th and the th atoms in the th molecule.

'Conju-Stru-Wiener-Index'
Wiener Index of the conjugation structure with the maximum size in one molecule.

∑ ∑
The subscript is the index of the conjugation structure containing maximum number of atoms in the th molecule.

'Conju-Stru-VSA'
The approximate surface area of all conjugation structures in one molecule. If we take the shape of each atom to be a sphere with radius equal to the van der Waals radius, we obtain the van der Waals surface area (VSA) for each atom. The sum of the VSA of each atom gives the molecular VSA.
Let us consider a molecule of n atoms. Each atom has a van der Waals radius of R i , and let B i denotes the set of all atoms bonded to the atom i. Distance between sphere A and B is d i . b ij is a pre-defined reference bond length between atom i and atom j. We will neglect the effect of atoms not related by a bond to the atom i and define the VSA for atom i, denoted by V i , to be:

'Conju-Branch-Num'
Half of the number of paths with the maximum distance ('Conju-Max-Distance') in the conjugated structure.

'Conju-sp2N-Num'
The number of N atom with sp2 hybridization in all conjugation structures in one molecule.

'Apparent-Conju-Electron-Count'
The maximum apparent electron counts of all conjugation structures in one molecule. The apparent electron count is calculated by adding up all the electron contributed by the atoms to the conjugation structure.
∑ stands for the number of electrons the th atom "apparently" contributes to the th conjugation structure in the th molecule as shown in the Other heavy atoms (S, P) 2 This feature is calculated by the formula below:

'Conju-Elec-Influence'
The electron-influence is calculated by summing up the product of the atom"s number of neighboring non-H atoms and the number of electrons it contributes to the conjugation structure.
∑ stands for the number of neighboring non-H atoms of the th atom in the th conjugation structure of the th molecule.
The definition of is already mentioned before.

'Conju-Elec-Influence-Ave'
The atomic normalization of electron-influence of the conjugation structures: ∑

'Conju-Elec-Distance-Coef'
The maximum of electronic distance coefficient of all conjugation structures in one molecule. The electronic distance coefficient is calculated by the formula below:

∑ ∑
Note that starts from so that all atom pairs are calculated only once.
is the shortest distance between the th and the th atoms in the th conjugation substructure of the th molecule.
The natural logarithm of result is taken as the feature:

'Conju-Elec-Distance-Coef-Norm'
The maximum of normalized electronic distance coefficient of all conjugation structures in one molecule. The normalization is done by dividing the electronic distance coefficient by the number of atom pairs.

( )
This value is relatively small so that it is not needed to take the form of logarithm:

'Conju-MultiElec-Distance-Coef'
The maximum of multi-electronic distance coefficient of all conjugation structures in one molecule. Its calculation is similar to that of the "Conju-Elec-Distance-Coef", but the number of electrons contributing to the conjugation structure is subtracted by 1:

'Conju-MultiElec-Distance-Coef-Norm'
The maximum of normalized multi-electronic distance coefficient of all conjugation structures in one molecule. The normalization is also done by dividing the multi-electronic distance coefficient by the number of atom pairs.

MFF-based MOE features
Five categories of properties could be calculated in our featurization algorithm (MFF-MOE): 'Apperant-Elec-Count', 'PEOE-Charge', 'EState-Indice', 'LogP', 'MR'. In this section, only the longest conjugated sub-structure is considered. The value of the properties of MFF is obtained by the summation of values of its containing atoms.
Taking "PEOE Charge" as an example, we sum up the Gasteiger atomic charges of atoms in an MFF fragment. The maximum and minimum of these summed PEOE charges of all fragments were then extracted as two molecular features: "PEOE-Charge-Max" and "PEOE-Charge-Min".
LogP is the logarithm of oil (octanol)-water partition coefficient of a molecule. The atomic attribution of LogP effectively explores the local polarity of a molecule. The summation of atomic LogP to MFF can identify polar groups in the molecule. Similarly, the MR is the polarizability of the molecule determined by molar refractivity. The atomic attribution of the MR highlights the polarizability of each atom in a molecule, while its summation to the MFF shows the polarizability of a conjugated fragment.
The 'Apperant-Elec-Count' is already defined in the previous section. The atomic PEOE charge, EState indice, contribution of LogP, and contribution of MR are calculated by the RDKit toolkit. The "Estate" parameters in our models are NOT electronic states numbers.
The following features have 'P x ' in the names. The 'P x ' stands for names of different properties as shown in the table below. " means the th atomic property of the th atom in the th molecule.

'P x -Sum'
The sum of atomic properties of atoms in the conjugation structure. If multiple conjugation structures are present in one molecule, then the value of the one with the maximum number of atoms will be used (similarly hereinafter).

'P x -Ave'
The atomic averaged value of atomic properties of atoms in the conjugation structure.

'P x -Max'
The maximum value of properties of single atoms and all possible MFF fragments in the conjugation structure. The sum of atomic properties in a MFF fragment is calculated first as fragment property. Then the maximum value is picked out from all the single atom properties together with all fragment properties.

∑
The fragment properties will be noted as " " which means the property of the th fragment in the th conjugation structure of the th molecule.
represents the atom index of the th fragment in the th conjugation structure of the th molecule. means the number of fragments in the th conjugation structure of the th molecule.

'P x -Min'
The minimum value of properties of single atoms and all possible MFF fragments in the conjugation structure.

'P x -Delta'
The difference between the maximum and minimum properties of single atoms and all possible MFF fragments in the conjugation structure.

'P x -Weighted'
The weighted atomic properties of all atoms in the conjugation structures. The weighted property is calculated by summing up the product of the atom"s number of neighboring non-H atoms and the respective atomic property.
∑ is the number of neighbor atoms of the k th atom of the j th conjugation structure in the i th molecule.

'P x -Weighted-Ave'
The atom averaged weighted atomic properties of all atoms in the conjugation structures.

'P x -PositiveDisCoef'
The one order distance coefficient of the atomic properties of all atoms in the conjugation structures. The one order distance coefficient is calculated by the formula below:

∑ ∑
It is noted that starts from so that all atom pairs are calculated only once.

'P x -PositiveDisCoef-PairMean'
The averaged one order distance coefficient of the atomic properties of all atoms in the conjugation structures.

'P x -NegativeDisCoef'
The negative one order distance coefficient of the atomic properties of all atoms in the conjugation structures. The negative one order distance coefficient is calculated by the formula below: ∑ ∑

'P x -NegativeDisCoef-PairMean'
The averaged negative one order distance coefficient of the atomic properties of all atoms in the conjugation structures.

'P x -GradSum'
The sum of gradient of the atomic properties of all atoms in the conjugation structures.

'P x -GradSum-PairMean'
The averaged gradient of the atomic properties of all atoms in the conjugation structures.

'P x -LaplaceSum'
The sum of Laplace gradient of the atomic properties of all atoms in the conjugation structures.

'P x -Laplace-PairMean'
The averaged laplace gradient of the atomic properties of all atoms in the conjugation structures.

'P x -MaxMinDisRatio'
The ratio between the distance of the single atoms or fragments with the maximum property and the minimum property over the maximum distance of conjugation structure.

{ }
is the atomic index of the single atom or the fragment with maximum and minimum properties.
stands for the longest distance of two atoms in the fragments with maximum and minimum properties.

'DAratio'
The ratio between the distance of the single atoms with the maximum PEOE charge and the minimum PEOE charge and the maximum distance of conjugation structure.
{ } means the atomic index of the single atom with maximum and minimum PEOE charge. stands for the longest distance of two atoms in the atoms with maximum and minimum PEOE charge.

Section 3. Feature importance
For LASSO, the magnitude of coefficients can be used as importance of the corresponding feature. For the GBRT and XGB Regressor, SHAP [5] , a Python library to calculate Shapley values, was implemented to generate more interpretable feature importance. As for each regressor, 240 rounds of cross validation with randomly generated training-testing splits were carried out, and the feature importance of each round was collected, multiplied by the R 2 score of respective round, and then added up as weighted accumulated feature importance.
Then the accumulated feature importance of each regressor was sorted and ranked. The         (Table S4).

Section 6. Comparison between conjugated length and conjugated area
The conjugated length and the conjugated area are correlated and are thus both correlated with the TPACS. We carefully compared these two features to show that the conjugated length is a better descriptor to choose.    Diagonalizing this Hamiltonian can give a series of orbitals.
By assuming every p orbital contributing one electron, we selected the HOMO−1, HOMO, LUMO, LUMO+1 orbitals to be the n−1, n, n+1, n+2 orbitals, respectively. We used these four orbitals to construct five Slater determinants to represent a ground state and four excited states, ignoring configuration interactions.
We calculated the transition dipole moments between these states in an arbitrary unit and calculated TPACS according to where is the TPA cross section, is the transition dipole moment between the two states i, j or the dipole moments of the ground or excited state when i=j; is the energy difference between these two states. g stands for the ground state; f stands for the final state; m stands for a mediating state in the TPA process based on the second order perturbation theory.
The TPA cross section of such a model was then calculated using numerical simulations. We are interested in the slope of lg(TPACS) vs lg(n). Figure S11. Model 1D conjugation system with alternative bond lengths.

Section 9. TDDFT Computational Details
The Gaussian 16 program package was used to calculate the transition dipole moment between states by time-dependent density functional theory (TDDFT) using the CAM-B3LYP functional, aug-cc-pVDZ basis set after all geometries were optimized at wB97xD/def2-TZVP level. All of two-photon absorption spectrum simulations were performed by a code using the MATLAB program used in our previous work [7] . The transition dipole moment matrix and the oscillator strengths had been prepared by TDDFT calculations and extracted by Multiwfn. We calculated the TPACS using sum-over-states (SOS) method using the lowest 20 singlet excited states. [7][8]

Section 10. Principle component analysis
The The mixing among the features are shown in Table S6.  The mixing of conjugation length to the "MR-Max" is shown in the column "New Feature 2" in Table S6. Consistent with the notion of the correlativity, the SHAP plot of "PCAmain-MR-Max" after PCA ( Figure S12d) changed significantly from the original one, while the SHAP of other features did not change much. The SHAP plot showed that the TPACS is positively correlated to the "PCAmain-MR-Max".
We drew several MFF structures with extreme feature values to probe the chemical information of this feature (Figures S12c,d). All the structures are the conjugation backbones.
The conjugated structure connected through carbon-carbon double bond, triphenylamine groups seem to increase the SHAP value, while a direct connection between two aromatic rings via a single bond has a negative effect. This later observation can be easily verified by analyzing the number of "ccc(cc)-c(c)c" MFF in molecule as a feature ( Figure S14c), which showed negative SHAP values to the TPACS. These connections affect coplanarity of the conjugation system. In the single bond linkage between two benzene rings, the steric repulsion of adjacent hydrogen atoms leads to a dihedral angle of ~60 o that decreases the degree of conjugation.

Conju-Branch-Ratio
To isolate the effect of branching, we proposed a conjugated descriptor called "Conju-Branch-Ratio" (Figure S15a) that is only weakly correlated to the conjugation length and able to tell the difference between linear and branched structures. The "Conju-Branch-Ratio" is close to 1 when the branching point appeared in the middle of a conjugated system, while it is close to 0 for a linear structure.
The Conju-Branch-Ratio is more suitable to our task of analyzing multipolar effect than several other structural shape features from the RDKit library ("Kappa3", "Chi3n", and "Conju-Stru-Wiener-Index"), as the latter three are all strongly correlated to the conjugation length ( Figure S15).

DAratio
The DAratio measures the distance between the donor and acceptor groups in a conjugated system by finding the distance between centers with the most positive and most negative PEOE charges. This distance is then divided by the conjugation length to give the "DAratio". A "DAratio" close to 0.5 corresponds to D-A-D or A-D-A structure, while a "DAratio" close to 1 corresponds to a D-A structure. A "DAratio" close to 0 indicates either very close D-A groups or the lack of any D or A groups.

Triphenylamine core
The triphenylamine core has a positive effect to increase TPACS as shown by the SHAP plot in Figures S14a-b, which happens to be a branching core and partly contributes to the impression of beneficial branching effect.
This triphenylamine core has many reasons to be a designer structure motif for obtaining high TPACS: (1)

Is there any significant effect from the solvent or test method used in experiment?
The descriptor of the polarity of the solvent, "ET(30)" (Figure 4f) shows that the solvent used in TPA testing would only make a small contribution to the testing results. A larger ET(30) value representing a more polar solvent gives a slightly more negative contribution to the lg(TPACS).
The dependence on the testing wavelength (Figure 4e) showed that longer wavelength correlated with higher TPACS.
We also counted the measurement methods used, the majority of which are Z-scan and twophoton excited fluorescence (TPEF) in Figure S18. Similar TPACS distributions were observed for the two methods but the sets of molecules with the highest TPACS were mostly measured by Z-scan, possibly because their emission quantum efficiencies are too low to detect their fluorescence, due to the energy gap law. Figure S18. Distribution of TPACS measured by the Z-scan and TPEF methods.

Section 15. Comparison between the machine learning performance with and without QM descriptors
Molecules in this dataset were geometry-optimized and descriptors like excitation energy and HOMO-LUMO gap were calculated using time-dependent DFT approach. ZINDO is also adopted to calculate some of the same quantities. Furthermore, some interpretable descriptors based on electron-hole analysis and electrostatic potential analysis by using Multiwfn software were also put into the feature matrix. The structure optimization was first performed by xtb and then refined by Gaussian using B3LYP(D3)/6-31g*. TDDFT calculation was implemented using PBE0/6-31g*.
Exactly the same pipeline of machine learning procedure was employed to extract important features. 120 splits of training and testing sets were randomly generated to evaluate the model performances with a train-test ratio of 85:15. We combined the feature importance indexes of the three regressors (LASSO, GBRT and XGBOOST) into a weighted one (SI Section 3, Figures   S1), which was used to remove the least important features one at a time from the feature matrix.
As a result, the performance of feature selection was shown in Figure S21, which showed that 20 features are sufficient to describe the system (Table S7) with the XGBoost performance of MAE of 0.38 and R 2 score of 0.48. Comparing with the performance based on MFF-MOE results, the QM-based machine learning is slightly less performing. (Table S8) The important features are listed below.
We prefer the use of the model without these QM descriptors as the QM-based models require additional calculations. Nevertheless, the QM-based results showed that the energy gaps and transition dipole moments are indeed significant, agreeing with the few-state models.
Electrostatic potential analysis [9] we employed electrostatic potential (ESP) to intuitively describe electrostatic interaction characteristics. This analysis will provide us a general understanding on the basic character of intermolecular interaction and local polarity.
Hole-electron analysis [10] In order to explore more useful descriptor of electronic excitations, we employed Multiwfn 3.8 to show a definitive picture about distribution of the hole and electron, which described where the excited electron leaves and arrives, respectively. To save computational resources, only the excited states of the brightest states of one-photon absorption (St1max) and two-photon absorption (St2max) were considered.